Integrate ndarray-parallel and make rayon an optional feature #563

bluss · 2018-11-24T19:15:17Z

This moves all the ndarray-parallel functionality directly into ndarray itself.

This means the parallelization features are more visible (in docs by default) and more accessible. The user still has to opt-in to use it, and use the rayon traits.

We also get rid of extra shims in the integration and use the rayon IntoParallel* traits directly, and can use inherent methods for par_apply and so on.

Fixes #551

bluss · 2018-11-25T17:17:17Z

Here I'm wondering, should we keep the parallel prelude separate? Or pack into the regular ndarray prelude?

It only contains this:

pub mod prelude {
    pub use rayon::prelude::{ParallelIterator, IndexedParallelIterator,
    IntoParallelIterator, IntoParallelRefIterator, IntoParallelRefMutIterator};
}

I think it's a good idea to have these traits in a prelude somewhere, we don't want the user to have to extern crate rayon or add rayon as a direct dep if they are purely working in ndarray for a moment.
I think we can move these into ndarray::prelude? Conditional on the rayon feature. These will show up/not show up in the prelude depending on the feature flag, but given Rust's rules for glob imports, they only collide if the items are used, it shouldn't actually be an incompatibility. Thoughts?

bluss · 2018-11-25T21:24:47Z

cc @cuviper Any thoughts about our integration with rayon and our parallel iterators? We're just moving code here and pulling in the rayon 1.0 dep in the main crate, how it's all supposed to be.

jturner314 · 2018-11-27T05:08:11Z

I think we can move these into ndarray::prelude? Conditional on the rayon feature.

I lean towards keeping them separate, but putting them in ndarray::prelude would be fine too.

Edit: Earlier, I suggested adding pub use rayon::prelude as parallel to ndarray::prelude, but I no longer think that's a good idea. IMO, if there's a ndarray::parallel module, it would be better to put the parallel prelude at ndarray::parallel::prelude.

Another interesting idea is to reexport all of rayon at the top level of ndarray:

// ndarray/src/lib.rs
pub use rayon;

although users could just depend directly on rayon instead, so I'm not sure if it's worth doing this.

bluss · 2018-11-27T06:16:02Z

reexporting rayon seems worthwhile.

jturner314 · 2018-11-27T16:21:51Z

One other idea for the prelude is to reexport the traits in ndarray::prelude so that their methods can be used easily but hide them from the docs and mangle the trait names so they need to be explicitly imported to use directly:

// ndarray/src/prelude.rs
#[doc(hidden)]
pub use rayon::prelude::{
    ParallelIterator as __rayon_ParallelIterator,
    IndexedParallelIterator as __rayon_IndexedParallelIterator,
    IntoParallelIterator as __rayon_IntoParallelIterator,
    IntoParallelRefIterator as __rayon_IntoParallelRefIterator,
    IntoParallelRefMutIterator as __rayon_IntoParallelRefMutIterator,
};

I think std does this for some traits.

bluss · 2018-11-27T16:45:26Z

We're merging breaking changes to prepare next major release right?

I think taken together with being an opt-in feature it seems the rayon traits in the regular prelude is a breaking change / method name clash hazard.

I think it's pretty obvious we should have rayon as optional, but maybe I'm wrong?

jturner314 · 2018-11-27T19:58:06Z

We're merging breaking changes to prepare next major release right?

Yes, that makes sense to me.

I think it's pretty obvious we should have rayon as optional, but maybe I'm wrong?

I think it's a good idea to make rayon optional for the time being, given the difference in compile times; we can always make it required later.

rcarson3 · 2018-11-28T03:23:09Z

I think it's good to keep rayon optional even in the future. In my own opinion, it's best not to force certain parallel dependencies on people. Groups can have some very opinionated ideas for how best to parallelize their code. I've especially seen this in the C++ and C realm when you're moving towards the Exascale type of things. You can see very domain specific solutions that aren't obvious that they lead to a noticeable performance difference.

bluss · 2018-11-28T06:41:25Z

@rcarson3 fair enough, we don't have a plan for transparent/automatic parallelization anywhere though, it's all just separate methods and parallel iterators one can use so far.

rcarson3 · 2018-11-28T17:55:58Z

we don't have a plan for transparent/automatic parallelization anywhere though, it's all just separate methods and parallel iterators one can use so far.

@bluss I'm aware of this I've had the pleasure of using a few of the parallel iterators to get some easy parallelization out of my code :)

I guess I've become a bit more cautious(?) towards requiring one parallelization approach over another for even the parallelization of iterators/compute kernels as I've become more and more exposed to the different ways people have found to tackle these type of problems for HPC applications.

I'll probably still continue to use this crate no matter what decisions y'all end up going way down the road for the parallel stuff, since I enjoy the ease this crate has made writing numerical code in Rust :)

jturner314 · 2018-11-28T18:56:07Z

@rcarson3 Out of curiosity, have you done any HPC-type stuff in Rust? If so, what crates were useful? I have a small SLURM cluster but haven't yet tried running any Rust code across multiple machines.

rcarson3 · 2018-11-29T02:48:12Z

@jturner314 I haven't gotten the chance to mess around with any HPC stuff in Rust yet. Hopefully, I'll get the chance to experiment with some of this stuff in the next month or so.

Also, I've been eyeballing rsmpi as a traditional method to accomplish HPC type stuff in Rust. It looks like some people from LANL are involved which means we might start seeing Rust used at the DOE national labs which is exciting. I've also been looking at timely-dataflow and differential-dataflow to try and figure out if they could be used in a similar manner to what you might use MPI for.

jturner314 · 2018-11-30T22:20:07Z

@rcarson3 Thanks for the links. The dataflow projects look particularly exciting, and relatively easy-to-use.

Put the parallel module down in the crate file, to fix inherent method order. The order in docs is important and is decided by the order of the modules in the main crate file.

Here we try to use the new rustdoc links. They should render fine on nightly and thus on docs.rs, and on stable the '[' brackets are visible verbatim, but we hope most are using docs.rs

This way they are somewhat easier to decipher in rustdoc.

Easier to read in rustdoc.

bluss · 2018-12-01T21:15:33Z

This should be ready. Using separate parallel prelude, and it contains the rayon traits and the par_azip macro.

bluss · 2018-12-01T21:34:15Z

I think the bounds on the &mut T producing parallel iterators are too strict. They currently use T: Send + Sync but for example a corresponding IntoParallelIterator on &mut [T] in rayon only requires T: Send.

cuviper · 2018-12-02T07:40:14Z

@bluss

Any thoughts about our integration with rayon and our parallel iterators?

Looks good to me!

I noticed that views and zips are unindexed -- AFAIK their length is known, so is this just because they're hard to split at arbitrary indexes? (I have the same limitation in rayon-hash.)

I think the bounds on the &mut T producing parallel iterators are too strict. They currently use T: Send + Sync but for example a corresponding IntoParallelIterator on &mut [T] in rayon only requires T: Send.

Is there a technical reason why you're requiring Sync? Or just an oversight?

bluss · 2018-12-02T08:24:58Z

@cuviper They are impossible to split at arbitrary linear indexes, if they have more than one dimension. i.e a 3 × 3 × 3 array view should be split along a good combination of axis and index.

For example along axis 0, index 1 (That split produces the following pieces: view of 1 × 3 × 3 and 2 × 3 × 3 see also illustrations).

And we want to split them in a way that preserves element locality. This is a dynamic property. Maybe the elements are closest to each other along axis 1 and furthest from each other along axis 0, then we split along axis 0. We want to increase our chances to end up with a contiguous chunk to work on for the final single job.

We can split successively on different axes, for example when the first axis has been split down to length 1, we go on to the current axis of furthest separation with len > 1 and split along that.

What can be added is to make more producers indexed, in their 1D cases only, but I'm unsure if that's possible. Say we have an Parallel<ArrayView<f64, D>> and when D = Ix1 we will additionally implement IndexedParallelIterator. Since the parallel traits cooperate a bit, I'm not sure if that works out.

I think the Sync thing is just an oversight / “old quick solution to be safe”, I don't remember any reason we require that.

cuviper · 2018-12-02T23:22:33Z

we want to split them in a way that preserves element locality.

I see -- and I suppose direct Array access can just split the backing storage, regardless of the formal shape? If that's the case, then it seems like you could do the same for iterating &Array and &mut Array, rather than going through views where that contiguity is no longer guaranteed.

edit: I think I mistook the macros for the indexed AxisIter part as a direct Array. Do you not have IntoParallelIterator for Array at all?

What can be added is to make more producers indexed, in their 1D cases only, but I'm unsure if that's possible. Say we have an Parallel<ArrayView<f64, D>> and when D = Ix1 we will additionally implement IndexedParallelIterator. Since the parallel traits cooperate a bit, I'm not sure if that works out.

The opt_len hack might be hard without specialization, unless you separately implemented each possible D for ParallelIterator, but that's just an optimization for collect to act like collect_vec.

I think it would work fine to just implement IndexedParallelIterator for the 1D case, and leave ParallelIterator as a more generic unindexed implementation. If that doesn't work for some reason, I'm willing to consider that as a rayon bug (whether or not it's possible to fix).

jturner314

Merging this stuff into ndarray makes things a lot nicer. Removing the Ndarray* traits is especially satisfying. :)

I've added some comments, primarily about docs. Some additional docs changes:

Remove See also the [`ndarray-parallel`] crate for integration with rayon. from src/lib.rs.
Add the rayon feature flag to the list in src/lib.rs.

I'd also remove the parallel directory (or maybe delete everything other than parallel/README.md and update it to say that the ndarray-parallel crate is deprecated). This makes the Git diff a lot easier to read (since it causes Git to recognize that most files were renamed with only small changes).

src/parallel/mod.rs

src/parallel/zipmacro.rs

src/parallel/ext_traits.rs

jturner314 · 2018-12-02T23:52:28Z

src/parallel/ext_traits.rs

+            /// `Zip`.
+            ///
+            /// Requires crate feature `rayon`.
+            pub fn par_apply<F>(self, function: F)


Consider moving par_apply out of this module and into src/zip/mod.rs immediately after apply. (This would make the docs somewhat clearer.)

ok this is left for later

…thods

We don't need another dep just to have faster rayon benchmarks (they are for comparison anyway.)

bluss · 2018-12-03T20:52:59Z

@cuviper We have parallel iterators for array views, for &Array and &mut Array, but not Array itself. Just like we don't have a regular by value iterator.

Implementing Indexed parallel iterators is left for later. This PR is mostly moving the code from a separate crate into the main crate.

cuviper · 2018-12-03T21:28:47Z

We have parallel iterators for array views, for &Array and &mut Array, but not Array itself. Just like we don't have a regular by value iterator.

Implementing Indexed parallel iterators is left for later.

OK, but note that having <&Array as IntoParallelIterator>::Iter = Parallel<ArrayView> will restrict what you can do at the type level. Using something like Iter = Parallel<&Array> could still be implemented in terms of the view-style Producer from drive_unindexed, but would leave you the flexibility to switch to an indexed implementation later (without a breaking change).

bluss · 2018-12-03T21:34:04Z

An ArrayView is to an Array like a slice to a Vec, so I think it makes sense to implement the indexed parallel iterator on the arrayview, we don't have anything separate from array views that handles an &Array.

cuviper · 2018-12-03T21:45:37Z

Right, but if I understand your earlier comment about preserving locality, a general ArrayView is not necessarily contiguous. Whereas a full &Array would be contiguous, and could more easily support arbitrary splits, no?

All I'm suggesting is:

// (with lots of omissions...)
impl IntoParallelIterator for &Array {
    type Iter = Parallel<&Array>;
}
impl ParallelIterator for Parallel<&Array> {
    fn drive_unindexed<C>(self, consumer: C) -> C::Result {
        // A delayed view conversion lets you do something else later
        self.0.view().into_par_iter().drive_unindexed(consumer)
    }
}

bluss · 2018-12-03T22:03:15Z

Aha that makes sense. We'll have to look if we can use that for extra features. In general, arrays and views have about the same properties in everything, including supporting custom strided axes, so I'm not sure if I'm right about not grasping the concept.

bluss force-pushed the integrate-rayon branch from cd4808f to 3464fe0 Compare November 24, 2018 19:16

bluss force-pushed the integrate-rayon branch from f9087ed to 8fde766 Compare November 28, 2018 21:13

bluss added 15 commits December 1, 2018 22:12

FEAT: Add ndarray::parallel module and optional feature rayon

ce45726

Put the parallel module down in the crate file, to fix inherent method order. The order in docs is important and is decided by the order of the modules in the main crate file.

FEAT: Parallel, make the extension array and zip methods inherent

f59969a

FEAT: Switch parallel to rayon's IntoParallelIterator traits

a4d60e7

DOC: Update docs for integrated ndarray::parallel

9fbb90f

DOC: Update parallel doc

4317698

Here we try to use the new rustdoc links. They should render fine on nightly and thus on docs.rs, and on stable the '[' brackets are visible verbatim, but we hope most are using docs.rs

FIX: Use where clause in Zip's parallel methods

1da56d8

This way they are somewhat easier to decipher in rustdoc.

FIX: Use where clauses in Zip's regular methods

49f79f4

Easier to read in rustdoc.

FIX: Use where clauses in Zip's parallel trait impls

cd7b6c8

DOC: Clarify parallel doc

3c6df12

DOC: Add more notices about crate feature rayon

e8bd68b

TEST: Move in all the tests from parallel

8a868eb

MAINT: Add rayon to docs features: show in docs.rs and run on travis

c8febb0

MAINT: Drop old parallel crate from travis tests

81b0906

TEST: Move benchmarks from parallel to main crate

68b751e

FIX: Use $crate in par_azip and add to the ndarray::parallel prelude

82f5cb7

bluss force-pushed the integrate-rayon branch from e14cac1 to 82f5cb7 Compare December 1, 2018 21:12

bluss changed the title ~~WIP: Integrate ndarray-parallel and make rayon an optional feature~~ Integrate ndarray-parallel and make rayon an optional feature Dec 1, 2018

DOC: Minor edits to ndarray::parallel docs

566d28b

jturner314 reviewed Dec 3, 2018

View reviewed changes

bluss added 8 commits December 3, 2018 20:42

FIX: Rename ext_traits to impl_par_methods

6f1108c

DOC: Improve docs for the parallel array methods

219dab0

FIX: Move parallel ArrayBase methods up close to the other mapping me…

558a2d2

…thods

FIX: Remove unused $name parameter in parallel macro

74aa19e

DOC: Use backticks in ndarray::parallel mod docs

e53c273

DOC: Edit doc for ndarray::parallel::prelude

d19585e

DOC: Update main module doc and readme for ndarray::parallel

b3c8fe7

TEST: Drop num_cpus as benchmark dev-dependency

b677c77

We don't need another dep just to have faster rayon benchmarks (they are for comparison anyway.)

bluss merged commit ad714f8 into master Dec 3, 2018

bluss deleted the integrate-rayon branch December 3, 2018 20:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate ndarray-parallel and make rayon an optional feature #563

Integrate ndarray-parallel and make rayon an optional feature #563

bluss commented Nov 24, 2018 •

edited

bluss commented Nov 25, 2018 •

edited

bluss commented Nov 25, 2018

jturner314 commented Nov 27, 2018 •

edited

bluss commented Nov 27, 2018

jturner314 commented Nov 27, 2018

bluss commented Nov 27, 2018

jturner314 commented Nov 27, 2018

rcarson3 commented Nov 28, 2018

bluss commented Nov 28, 2018

rcarson3 commented Nov 28, 2018 •

edited

jturner314 commented Nov 28, 2018

rcarson3 commented Nov 29, 2018

jturner314 commented Nov 30, 2018

bluss commented Dec 1, 2018

bluss commented Dec 1, 2018

cuviper commented Dec 2, 2018

bluss commented Dec 2, 2018 •

edited

cuviper commented Dec 2, 2018 •

edited

jturner314 left a comment

jturner314 Dec 2, 2018

bluss Dec 3, 2018

bluss commented Dec 3, 2018

cuviper commented Dec 3, 2018

bluss commented Dec 3, 2018

cuviper commented Dec 3, 2018 •

edited

bluss commented Dec 3, 2018

Integrate ndarray-parallel and make rayon an optional feature #563

Integrate ndarray-parallel and make rayon an optional feature #563

Conversation

bluss commented Nov 24, 2018 • edited

bluss commented Nov 25, 2018 • edited

bluss commented Nov 25, 2018

jturner314 commented Nov 27, 2018 • edited

bluss commented Nov 27, 2018

jturner314 commented Nov 27, 2018

bluss commented Nov 27, 2018

jturner314 commented Nov 27, 2018

rcarson3 commented Nov 28, 2018

bluss commented Nov 28, 2018

rcarson3 commented Nov 28, 2018 • edited

jturner314 commented Nov 28, 2018

rcarson3 commented Nov 29, 2018

jturner314 commented Nov 30, 2018

bluss commented Dec 1, 2018

bluss commented Dec 1, 2018

cuviper commented Dec 2, 2018

bluss commented Dec 2, 2018 • edited

cuviper commented Dec 2, 2018 • edited

jturner314 left a comment

Choose a reason for hiding this comment

jturner314 Dec 2, 2018

Choose a reason for hiding this comment

bluss Dec 3, 2018

Choose a reason for hiding this comment

bluss commented Dec 3, 2018

cuviper commented Dec 3, 2018

bluss commented Dec 3, 2018

cuviper commented Dec 3, 2018 • edited

bluss commented Dec 3, 2018

bluss commented Nov 24, 2018 •

edited

bluss commented Nov 25, 2018 •

edited

jturner314 commented Nov 27, 2018 •

edited

rcarson3 commented Nov 28, 2018 •

edited

bluss commented Dec 2, 2018 •

edited

cuviper commented Dec 2, 2018 •

edited

cuviper commented Dec 3, 2018 •

edited